# PG-MORL

This repository shows the baseline for the paper [Q-PENSIEVE: BOOSTING SAMPLE EFFICIENCY OF MULTI-OBJECTIVE RL THROUGH MEMORY SHARING OF Q-SNAPSHOT]

We use the open-source implementation of (Xu et al., 2020) for the experiments.

## Installation

#### Prerequisites

- **Python Version**: >= 3.7.4.
- **PyTorch Version**: >= 1.3.0.
- **MuJoCo** : install mujoco and mujoco-py of version 2.0 by following the instructions in [mujoco-py](<https://github.com/openai/mujoco-py>).

#### Install Dependencies

You can either install the dependencies in a conda virtual env (recomended) or manually. 

For conda virtual env installation, simply create a virtual env named **pgmorl** by:

```
conda env create -f environment.yml
```

If you prefer to install all the dependencies by yourself, you could open `environment.yml` in editor to see which packages need to be installed by `pip`.

#### Benchmark Problems

• Continuous Deep Sea Treasure (DST): 2 objectives: treasure_value, action_penalty
• Multi-Objective Continuous LunarLander: 4 objectives: main engine cost, side engine cost, shaping reward, and result reward.
• MuJoCo: 
– HalfCheetah: 2 objectives as forward speed, control cost, 1000 times for control cost
– Hopper: 2 objectives: forward speed, control cost, 1500 times for control cost
– Hopper3d: 3 objectives: forward speed, jump reward, control cost, 1500 times for control cost. The jump reward is 15 times of the difference between current height and initial height.
– Ant: 2 objectives: forward speed, control cost, 1 times for control cost.
– Ant3d: 3 objectives: forward speed, control cost, healthy reward, 1 times for control cost, 1 times for healthy reward.
– Walker2d: 2 objectives: forward speed, control cost, 1000 times for control cost.

#### Train

The main entrance of the training code is at  `morl/run.py`. 
- Enter the project folder

  ```
  cd PGMORL
  ```

- To run PGMORL on *Walker2d-v2* with random seed 1:

  ```
  python walker2d-v2.py --pgmorl --inputseed 1
  ```

- To run PFA on *hopper-v3.py* with random seed 2:

  ```
  python hopper-v3.py --pgmorl --inputseed 2
  ```


## Acknowledgement

We use the implementation of [pytorch-a2c-ppo-acktr-gail](https://github.com/ikostrikov/pytorch-a2c-ppo-acktr-gail) as the underlying PPO implementation and modify it into our Multi-Objective Policy Gradient algorithm.
We use the open-source implementation of [PGMORL](https://github.com/mit-gfx/PGMORL) as our baselines.


